@*
* Copyright 2016 LinkedIn Corp.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*@
<p>There are some spark property settings that are known to be good choices in most of the scenarios. Spark users are
  highly encouraged to adopt best practices whenever possible. Unaligned property settings will either yield to higher
  than severe level or merely moderate warnings. Please also refer to
  <a href="https://spark.apache.org/docs/1.2.0/configuration.html" target="_blank">Spark Configuration Doc</a>
  for property tuning beyond this page's suggestions. </p>
<h3>Suggestions</h3>

<h4><strong>spark.serializer</strong></h4>
<p>
  Class to use for serializing objects that will be sent over the network or need to be cached in serialized form.
  The default of Java serialization works with any Serializable Java object but is quite slow, so we recommend using
  <strong>org.apache.spark.serializer.KryoSerializer</strong> and configuring Kryo serialization whenever possible.
  This also becomes a default choice in Spark 1.3.
</p>

<h4><strong>spark.driver.memory</strong></h4>
<p>
  <strong>spark.driver.memory</strong> has a default value of <strong>512m</strong>. Allocating more memory than default
  is generally acceptable, but users should realize that a too large driver memory ask against the cluster could yield
  to long queueing time. Generally we would recommend to keep the memory allocation <strong><=8g</strong>.
</p>

<h4><strong>spark.shuffle.manager</strong></h4>
<p>
  Implementation to use for shuffling data. Available choices are <strong>shuffle</strong> or <strong>sort</strong>.
  Sort-based shuffle is more memory-efficient and is the default option starting in 1.2. We'd recommend using
  <strong>sort</strong> in almost all scenario.
</p>


<h4><strong>spark.executor.cores</strong></h4>
<p>
  <strong>spark.executor.cores</strong> has a default value of <strong>1</strong>. Our Hadoop 2 clusters currently turn
  off CPU scheduling, even if you specify a large number for executor-cores, your Spark executor is not guaranteed to
  get the specified number of virtual cores (vCores). This will cause the executor to run more concurrent tasks than
  vCores available to it, causing frequent context switching and eventually slowing down your application
  with high failure rate. We suggest setting <strong>executor-cores <= 2</strong>.
</p>
